default prediction
A Multilayered Approach to Classifying Customer Responsiveness and Credit Risk
Afolabi, Ayomide, Ogburu, Ebere, Kimitei, Symon
AB S TRACT This study evaluates the performance of various classifiers in three distinct models: r esponse, r isk, and r esponse - r isk, concerning credit card mail campaigns and default prediction. In the r esponse model, the Extra Trees classifier demonstrates the highest recall level (79.1%), emphasizing its effectiveness in identifying potential responders to targeted credit card offers. Conversely, in the r isk model, the Random Forest classifier exhibits remarkable specificity of 84.1%, crucial for identifying customers least likely to default. Furthermore, in the multi - class r esponse - r isk model, the Random Forest classifier achieve s the highest accuracy (83.2%), indicating its efficacy in discerning both potential responders to credit card mail campaign and low - risk credit card users . In this study, we optimized various performance metrics to solve a specific credit risk and mail responsiveness business problem.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Enhancing Credit Default Prediction Using Boruta Feature Selection and DBSCAN Algorithm with Different Resampling Techniques
Ampomah, Obu-Amoah, Agyemang, Edmund, Acheampong, Kofi, Agyekum, Louis
This study examines credit default prediction by comparing three techniques, namely SMOTE, SMOTE-Tomek, and ADASYN, that are commonly used to address the class imbalance problem in credit default situations. Recognizing that credit default datasets are typically skewed, with defaulters comprising a much smaller proportion than non-defaulters, we began our analysis by evaluating machine learning (ML) models on the imbalanced data without any resampling to establish baseline performance. These baseline results provide a reference point for understanding the impact of subsequent balancing methods. In addition to traditional classifiers such as Naive Bayes and K-Nearest Neighbors (KNN), our study also explores the suitability of advanced ensemble boosting algorithms, including Extreme Gradient Boosting (XGBoost), AdaBoost, Gradient Boosting Machines (GBM), and Light GBM for credit default prediction using Boruta feature selection and DBSCAN-based outlier detection, both before and after resampling. A real-world credit default data set sourced from the University of Cleveland ML Repository was used to build ML classifiers, and their performances were tested. The criteria chosen to measure model performance are the area under the receiver operating characteristic curve (ROC-AUC), area under the precision-recall curve (PR-AUC), G-mean, and F1-scores. The results from this empirical study indicate that the Boruta+DBSCAN+SMOTE-Tomek+GBM classifier outperformed the other ML models (F1-score: 82.56%, G-mean: 82.98%, ROC-AUC: 90.90%, PR-AUC: 91.85%) in a credit default context. The findings establish a foundation for future progress in creating more resilient and adaptive credit default systems, which will be essential as credit-based transactions continue to rise worldwide.
- North America > United States > Michigan > Kalamazoo County > Kalamazoo (0.04)
- North America > United States > Texas > Hidalgo County > Edinburg (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
- Banking & Finance > Credit (0.70)
- Information Technology (0.68)
- Banking & Finance > Loans (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Why Bonds Fail Differently? Explainable Multimodal Learning for Multi-Class Default Prediction
Lu, Yi, Ling, Aifan, Wang, Chaoqun, Xu, Yaxin
In recent years, China's bond market has seen a surge in defaults amid regulatory reforms and macroeconomic volatility. Traditional machine learning models struggle to capture financial data's irregularity and temporal dependencies, while most deep learning models lack interpretability-critical for financial decision-making. To tackle these issues, we propose EMDLOT (Explainable Multimodal Deep Learning for Time-series), a novel framework for multi-class bond default prediction. EMDLOT integrates numerical time-series (financial/macroeconomic indicators) and unstructured textual data (bond prospectuses), uses Time-Aware LSTM to handle irregular sequences, and adopts soft clustering and multi-level attention to boost interpretability. Experiments on 1994 Chinese firms (2015-2024) show EMDLOT outperforms traditional (e.g., XGBoost) and deep learning (e.g., LSTM) benchmarks in recall, F1-score, and mAP, especially in identifying default/extended firms. Ablation studies validate each component's value, and attention analyses reveal economically intuitive default drivers. This work provides a practical tool and a trustworthy framework for transparent financial risk modeling.
- Asia > China > Shanghai > Shanghai (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (2 more...)
- Financial News (1.00)
- Research Report > New Finding (0.68)
- Banking & Finance > Trading (1.00)
- Banking & Finance > Economy (1.00)
- Banking & Finance > Credit (0.95)
Interpretable Credit Default Prediction with Ensemble Learning and SHAP
Yang, Shiqi, Huang, Ziyi, Xiao, Wengran, Shen, Xinyu
--This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms. Through preprocessing, feature engineering, and model training of the Home Credit dataset, the performance of multiple models including logistic regression, random forest, XGBoost, LightGBM, etc. in terms of accuracy, precision, and recall is evaluated. The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems. At the same time, the SHAP method is used to analyze the importance and dependency of features, and it is found that the external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value. The research results provide effective reference and technical support for the intelligent development of credit risk control systems.
Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based Applications
Wang, Yuhan, Xu, Zhen, Yao, Yue, Liu, Jinsong, Lin, Jiating
With the development of the financial industry, credit default prediction, as an important task in financial risk management, has received increasing attention. Traditional credit default prediction methods mostly rely on machine learning models, such as decision trees and random forests, but these methods have certain limitations in processing complex data and capturing potential risk patterns. To this end, this paper proposes a deep learning model based on the combination of convolutional neural networks (CNN) and Transformer for credit user default prediction. The model combines the advantages of CNN in local feature extraction with the ability of Transformer in global dependency modeling, effectively improving the accuracy and robustness of credit default prediction. Through experiments on public credit default datasets, the results show that the CNN+Transformer model outperforms traditional machine learning models, such as random forests and XGBoost, in multiple evaluation indicators such as accuracy, AUC, and KS value, demonstrating its powerful ability in complex financial data modeling. Further experimental analysis shows that appropriate optimizer selection and learning rate adjustment play a vital role in improving model performance. In addition, the ablation experiment of the model verifies the advantages of the combination of CNN and Transformer and proves the complementarity of the two in credit default prediction. This study provides a new idea for credit default prediction and provides strong support for risk assessment and intelligent decision-making in the financial field. Future research can further improve the prediction effect and generalization ability by introducing more unstructured data and improving the model architecture.
- North America > United States > New York (0.04)
- Asia > Singapore (0.04)
- North America > United States > Maine > Cumberland County > Portland (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Research Report > New Finding (0.49)
- Research Report > Experimental Study (0.34)
- Banking & Finance > Credit (1.00)
- Information Technology > Security & Privacy (0.87)
Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction
With the widespread application of machine learning in financial risk management, conventional wisdom suggests that longer training periods and more feature variables contribute to improved model performance. This paper, focusing on mortgage default prediction, empirically discovers a phenomenon that contradicts traditional knowledge: in time series prediction, increased training data timespan and additional non-critical features actually lead to significant deterioration in prediction effectiveness. Using Fannie Mae's mortgage data, the study compares predictive performance across different time window lengths (2012-2022) and feature combinations, revealing that shorter time windows (such as single-year periods) paired with carefully selected key features yield superior prediction results. The experimental results indicate that extended time spans may introduce noise from historical data and outdated market patterns, while excessive non-critical features interfere with the model's learning of core default factors. This research not only challenges the traditional "more is better" approach in data modeling but also provides new insights and practical guidance for feature selection and time window optimization in financial risk prediction.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
- Banking & Finance > Loans > Mortgages (0.68)
- Government > Regional Government > North America Government > United States Government (0.34)
KACDP: A Highly Interpretable Credit Default Prediction Model
In today's financial field, individual credit risk prediction has become a crucial part in the risk management of financial institutions. Accurate default prediction can not only help financial institutions significantly reduce losses but also significantly improve the utilization rate of funds, thereby enhancing their competitiveness in the market [1] [2]. With the rapid development of financial technology, numerous machine learning and deep learning techniques are gradually being widely applied in credit risk assessment. However, the existing various methods inevitably expose certain limitations when dealing with high-dimensional and nonlinear data, among which the problems of interpretability and transparency are the most prominent [3]. Traditional credit risk prediction methods mainly include two categories: statistical models and machine learning models. The typical representative of statistical models, such as Logistic regression [4], has the advantage of being simple and easy to use. However, when dealing with complex data, due to relatively strict assumptions, it is often difficult to effectively capture nonlinear relationships. Machine learning models, such as Random Forest (RF) [5], Support Vector Machine (SVM) [6], and Extreme Gradient Boosting Machine (XGBoost) [7], although they perform relatively well in handling high-dimensional data, their interpretability is relatively poor and it is difficult to provide a clear and transparent decision-making process. Deep learning models, like Multi-Layer Perceptron (MLP) [8] and Recurrent Neural Network (RNN) [9], although they have strong expressive ability, in the practical application in the financial field, their black-box characteristics cause the model to severely lack transparency and interpretability, which undoubtedly becomes a major problem in the strictly regulated financial industry [10].
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Credit (1.00)
Artificial intelligence-based blockchain-driven financial default prediction
With the rapid development of technology, blockchain and artificial intelligence technology are playing a huge role in all walks of life. In the financial sector, blockchain solves many security problems in data storage and management in traditional systems with its advantages of decentralization and security. And artificial intelligence has huge advantages in financial forecasting and risk management through its powerful algorithmic modeling capabilities. In financial default prediction using blockchain and artificial intelligence technology is a very powerful application. Blockchain technology guarantees the credibility of data and consistency on all nodes, and machine learning builds a high-level default prediction model through detailed analysis of big data. This study offers financial institutions new thoughts on financial technology in terms of credit risk mitigation and financial system stabilization.
- Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.05)
- Asia > China > Zhejiang Province > Ningbo (0.05)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
Financial Default Prediction via Motif-preserving Graph Neural Network with Curriculum Learning
Wang, Daixin, Zhang, Zhiqiang, Zhao, Yeyu, Huang, Kai, Kang, Yulin, Zhou, Jun
User financial default prediction plays a critical role in credit risk forecasting and management. It aims at predicting the probability that the user will fail to make the repayments in the future. Previous methods mainly extract a set of user individual features regarding his own profiles and behaviors and build a binary-classification model to make default predictions. However, these methods cannot get satisfied results, especially for users with limited information. Although recent efforts suggest that default prediction can be improved by social relations, they fail to capture the higher-order topology structure at the level of small subgraph patterns. In this paper, we fill in this gap by proposing a motif-preserving Graph Neural Network with curriculum learning (MotifGNN) to jointly learn the lower-order structures from the original graph and higherorder structures from multi-view motif-based graphs for financial default prediction. Specifically, to solve the problem of weak connectivity in motif-based graphs, we design the motif-based gating mechanism. It utilizes the information learned from the original graph with good connectivity to strengthen the learning of the higher-order structure. And considering that the motif patterns of different samples are highly unbalanced, we propose a curriculum learning mechanism on the whole learning process to more focus on the samples with uncommon motif distributions. Extensive experiments on one public dataset and two industrial datasets all demonstrate the effectiveness of our proposed method.
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- Asia > China (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Montserrat (0.04)
- Banking & Finance > Credit (0.49)
- Information Technology > Security & Privacy (0.46)
A machine learning workflow to address credit default prediction
Rahmani, Rambod, Parola, Marco, Cimino, Mario G. C. A.
Due to the recent increase in interest in Financial Technology (FinTech), applications like credit default prediction (CDP) are gaining significant industrial and academic attention. In this regard, CDP plays a crucial role in assessing the creditworthiness of individuals and businesses, enabling lenders to make informed decisions regarding loan approvals and risk management. In this paper, we propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations. The workflow consists of multiple steps, each designed to leverage the strengths of different techniques featured in machine learning pipelines and, thus best solve the CDP task. We employ a comprehensive and systematic approach starting with data preprocessing using Weight of Evidence encoding, a technique that ensures in a single-shot data scaling by removing outliers, handling missing values, and making data uniform for models working with different data types. Next, we train several families of learning models, introducing ensemble techniques to build more robust models and hyperparameter optimization via multi-objective genetic algorithms to consider both predictive accuracy and financial aspects. Our research aims at contributing to the FinTech industry in providing a tool to move toward more accurate and reliable credit risk assessment, benefiting both lenders and borrowers.
- Europe > Switzerland (0.04)
- Europe > Monaco (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- Workflow (0.93)
- Research Report (0.64)
- Information Technology > Security & Privacy (0.55)
- Banking & Finance > Credit (0.52)